Skip to content

loader: handle struct.error from dnfile and raise CorruptFile with a clear message#2872

Open
devs6186 wants to merge 2 commits intomandiant:masterfrom
devs6186:fix/2442-dnfile-struct-error
Open

loader: handle struct.error from dnfile and raise CorruptFile with a clear message#2872
devs6186 wants to merge 2 commits intomandiant:masterfrom
devs6186:fix/2442-dnfile-struct-error

Conversation

@devs6186
Copy link
Contributor

closes #2442

When capa analyzes a .NET PE with truncated or malformed .NET metadata, dnfile.dnPE() raises struct.error: unpack_from requires a buffer of at least N bytes. This is caught nowhere and bubbles up as an unhandled exception with no useful user-facing message.

Changes

  • In capa/features/extractors/dnfile/extractor.py and capa/features/extractors/dotnetfile.py, wrapped dnfile.dnPE(...) in try/except struct.error.
  • Re-raises as capa.loader.CorruptFile with the message: "Invalid or truncated .NET metadata; the file may be corrupted or not a valid .NET PE."
  • Added import struct to both files.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed
  • This submission includes AI-generated code and I have provided details in the description.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @devs6186, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of capa's .NET file analysis by gracefully handling corrupted or malformed .NET metadata. Previously, such files could lead to unhandled struct.error exceptions, causing the analysis to fail abruptly without a clear explanation. The changes introduce specific error trapping and re-raise a more informative exception, providing users with better feedback when encountering problematic .NET executables.

Highlights

  • Robust Error Handling: Implemented error handling for struct.error exceptions that can occur when dnfile.dnPE() attempts to parse malformed or truncated .NET metadata.
  • Clearer User Feedback: When a struct.error is caught, it is now re-raised as a capa.loader.CorruptFile exception with a user-friendly message: "Invalid or truncated .NET metadata; the file may be corrupted or not a valid .NET PE."
  • Dependency Import: The struct module was imported in capa/features/extractors/dnfile/extractor.py and capa/features/extractors/dotnetfile.py to support the new error handling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CHANGELOG.md
    • Added a changelog entry for handling struct.error from dnfile and showing a clear CorruptFile message.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request effectively addresses the struct.error raised by dnfile.dnPE() by wrapping the call in a try/except block and re-raising a CorruptFile exception with a clear, user-friendly message. This improves error handling and provides better feedback for malformed .NET PE files. The changes are applied consistently across both capa/features/extractors/dnfile/extractor.py and capa/features/extractors/dotnetfile.py, and the CHANGELOG.md has been updated appropriately. The addition of import struct is also correct.

@devs6186
Copy link
Contributor Author

Thanks for the review! Glad the approach looks right.

One small thing worth calling out on the DnfileFeatureExtractor side: super().__init__() is intentionally called after the dnPE() call rather than before, because computing SampleHashes.from_bytes() in the parent is cheap, but if parsing blows up on a malformed file it's cleaner to fail fast before we've done any work. The ordering in DotnetFileFeatureExtractor already had super().__init__() first (since it reads hashes independently), so I kept that as-is.

No other changes needed based on the feedback — both extractors are now consistent in how they surface this error to the caller.

@mr-tz
Copy link
Collaborator

mr-tz commented Feb 22, 2026

Would a helper function make sense to only write the except handling code once?
I'd also like to see the provided exception details vs. assuming it's a specific parsing error. What do you think?

@devs6186
Copy link
Contributor Author

Would a helper function make sense to only write the except handling code once? I'd also like to see the provided exception details vs. assuming it's a specific parsing error. What do you think?

Well both suggestions make sense and I can incorporate them.

  1. Helper function
    Agreed. Duplicating the try/except struct.error in extractor.py and dotnetfile.py isn’t ideal. I’ll add a small helper (e.g. in loader or in a shared dnfile helper used by both) that performs the dnPE() call and centralizes the handling. Both call sites will then use that helper so the except logic lives in one place.
  2. Exception details
    Also agreed. Using a fixed message (“Invalid or truncated .NET metadata…”) hides the real cause. I’ll change it so we surface the underlying error, e.g. by including str(e) in the CorruptFile message, e.g.
    "Invalid or truncated .NET metadata: {original_message}"
    (or similar wording). That way we don’t assume it’s “a specific parsing error” and users get the actual reason.
    Just let me know, ill update these changes and push a commit!

@mike-hunhoff
Copy link
Collaborator

@devs6186 CI tests are failing:

isort....................................................................Failed
- hook id: isort
- files were modified by this hook

Fixing /home/runner/work/capa/capa/capa/features/extractors/dotnetfile.py
Skipped 4 files

All changes made by hooks:
diff --git a/capa/features/extractors/dotnetfile.py b/capa/features/extractors/dotnetfile.py
index 3c3d83b..919f4ff 100644
--- a/capa/features/extractors/dotnetfile.py
+++ b/capa/features/extractors/dotnetfile.py
@@ -12,8 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import logging
 import struct
+import logging
 from typing import Iterator
 from pathlib import Path

Please ensure all unit tests are passing locally before requesting another review, thank you!

@devs6186
Copy link
Contributor Author

@devs6186 CI tests are failing:

isort....................................................................Failed
- hook id: isort
- files were modified by this hook

Fixing /home/runner/work/capa/capa/capa/features/extractors/dotnetfile.py
Skipped 4 files

All changes made by hooks:
diff --git a/capa/features/extractors/dotnetfile.py b/capa/features/extractors/dotnetfile.py
index 3c3d83b..919f4ff 100644
--- a/capa/features/extractors/dotnetfile.py
+++ b/capa/features/extractors/dotnetfile.py
@@ -12,8 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import logging
 import struct
+import logging
 from typing import Iterator
 from pathlib import Path

Please ensure all unit tests are passing locally before requesting another review, thank you!

yes @mike-hunhoff i just noticed, ill take a days time not more than that and look into all the comments and changes requested for this pr and all the others and get back to you asap, thank you for your patience !

…ssage

When .NET metadata is truncated or invalid, dnfile can raise struct.error.
Catch it in DnfileFeatureExtractor and DotnetFileFeatureExtractor and
re-raise as CorruptFile with a user-friendly message.

Fixes mandiant#2442
@devs6186 devs6186 force-pushed the fix/2442-dnfile-struct-error branch from c4b0563 to da59237 Compare February 24, 2026 03:47
@mike-hunhoff
Copy link
Collaborator

Would a helper function make sense to only write the except handling code once? I'd also like to see the provided exception details vs. assuming it's a specific parsing error. What do you think?

Well both suggestions make sense and I can incorporate them.

  1. Helper function
    Agreed. Duplicating the try/except struct.error in extractor.py and dotnetfile.py isn’t ideal. I’ll add a small helper (e.g. in loader or in a shared dnfile helper used by both) that performs the dnPE() call and centralizes the handling. Both call sites will then use that helper so the except logic lives in one place.
  2. Exception details
    Also agreed. Using a fixed message (“Invalid or truncated .NET metadata…”) hides the real cause. I’ll change it so we surface the underlying error, e.g. by including str(e) in the CorruptFile message, e.g.
    "Invalid or truncated .NET metadata: {original_message}"
    (or similar wording). That way we don’t assume it’s “a specific parsing error” and users get the actual reason.
    Just let me know, ill update these changes and push a commit!

@devs6186 we'll give this another review once you've addressed these changes, thank you!

Add load_dotnet_image() to dnfile/helpers.py that calls dnfile.dnPE()
and catches struct.error, raising CorruptFile with the original error
message included (f"Invalid or truncated .NET metadata: {e}").

Both DnfileFeatureExtractor and DotnetFileFeatureExtractor now call the
helper instead of duplicating the try/except block, and their direct
import of struct is removed.

Addresses review feedback on mandiant#2872.
@devs6186
Copy link
Contributor Author

Would a helper function make sense to only write the except handling code once? I'd also like to see the provided exception details vs. assuming it's a specific parsing error. What do you think?

Well both suggestions make sense and I can incorporate them.

  1. Helper function
    Agreed. Duplicating the try/except struct.error in extractor.py and dotnetfile.py isn’t ideal. I’ll add a small helper (e.g. in loader or in a shared dnfile helper used by both) that performs the dnPE() call and centralizes the handling. Both call sites will then use that helper so the except logic lives in one place.
  2. Exception details
    Also agreed. Using a fixed message (“Invalid or truncated .NET metadata…”) hides the real cause. I’ll change it so we surface the underlying error, e.g. by including str(e) in the CorruptFile message, e.g.
    "Invalid or truncated .NET metadata: {original_message}"
    (or similar wording). That way we don’t assume it’s “a specific parsing error” and users get the actual reason.
    Just let me know, ill update these changes and push a commit!

@devs6186 we'll give this another review once you've addressed these changes, thank you!

i have addressed them now in the latest commit

@mike-hunhoff
Copy link
Collaborator

@devs6186 lints are failing. Please ensure all lints pass locally before requesting another review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

struct.error: unpack_from requires a buffer of at least 4 bytes for unpacking 4 bytes at offset 0 (actual buffer size is 0)

3 participants